Clinton vs Trump

A Splunk analysis on the independent expenditures from the 2016 presidential election

of independent expenditures spent

Presidential elections in the United States are always contentious. A battle of ideas for arguably the most powerful position in the planet more than justifies it. But the influence of money in these elections has become a source of great frustration.

The disproportionate influence of independent expenditures has always been controversial, but until relatively recently those dollars were all reported – who and where the money came from, how the money was spent, which candidate benefited from that spending. But since the Citizens United v. Federal Elections Commission decision in 2010, the rise of independent spending by so-called "Super PACs" in these elections – with no reporting on who or where the money comes from – has risen to historic levels. So, how much money is truly being spent by these Super PACs? And who is winning the "independent" campaign?

As a follow-up to the Splunk FEC Explorer App in the 2012 election, this 2016 Presidential Election Influence explorer focuses on the flow of soft money. Who is spending for and against Clinton? Who is spending for and against Trump? And most importantly, how much are they spending over time? And we get the data to you using Splunk.

The wonderful data

Overall, independent expenditures in 2016 have indicated an overwhelmingly negative race, and Super-PAC spending has comprised a significant amount of that spending. However, that is not necessarily the case for both candidates.

Hover over each chart for more details.

More than times the amount of spending has been directed toward Trump than toward Clinton, but the majority of that spending was spent in opposition. Independent spending on Clinton in the election to date has been roughly even for and against.

Hover over the chart and events for more details.

But how is the money flowing in the race? Who is spending for and against each candidate? A bigger picture can be created by looking at the top spending committees (mostly Super PACs) supporting and opposing the presidential candidates in the chart below. The top 5 most-spending committees are shown per candidate per action (supporting or opposing).

Select from the dropdown below to filter between supporting and opposing expenditures. Committee names in italics aren't Super PACs.

How the data flows

The Federal Elections Commission has for many years made data available on the funding of presidential elections in the United States. Data is readily available on the FEC website on who, what, and from where direct contributions are coming into candidates. You can ask questions, search names, and even look at this all on pretty maps. What is harder to find, however, is accessible information on Super-PACs. And that’s where Splunk comes in.

We have linked up to the latest FEC reports directly via the OpenFEC API (beta), where we get access to the latest contributions and spending data as soon as the data is updated. Ingesting into Splunk is easy and anyone can visualize the data themselves. Splunk has a number of out-of-the-box visualizations available, or you can design your own in d3 or other visualization libraries. The visualization displayed above is called "Halo" and is built using d3.js, and is be available for use in Splunk at Splunkbase.

The technical discussion

by Satoshi Kawasaki

We used the OpenFEC site to figure out Clinton and Trump's candidate ID as P00003392 and P80001571, respectively.

The API endpoint for independent expenditures is called Schedule E. An example endpoint is:

Each response returns an object called last_indexes that is needed to paginate through the full result set. Therefore we wrote a Python script on the Splunk server (different from this web server) to store and keep track of last_indexes and write the response to a file. The script also runs every hour to update the file and add new entries.

Each response has up to 100 independent expenditure transactions, and we wanted each transaction to be its own individual event in Splunk. Using trial-and-error, we created props.conf and transforms.conf for Splunk to be the following:

# props.conf
LINE_BREAKER = (\[|,|\R){
# Use expenditure_date or use dissemination_date only when expenditure_date is null
TIME_PREFIX = expenditure_date":"|dissemination_date":"(?=.+?expenditure_date":null)|expenditure_date":null.+?dissemination_date":"
MAX_DAYS_AGO = 10951
TRANSFORMS-0 = fec_schedule_e_drop_events
SEDCMD-0 = s/"}],".+$/"}/

KV_MODE = json
LOOKUP-0 = candidates candidate_id
LOOKUP-1 = support_oppose_indicator support_oppose_indicator
# transforms.conf
REGEX = [^}]$
DEST_KEY = queue
FORMAT = nullQueue

filename = candidates.csv

filename = support_oppose_indicator.csv

Finally a simple candidates.csv and support_oppose_indicator.csv, which is referenced in the conf files above, for mapping the candidate IDs to their names and the support/oppose flag, respectively:


With these settings, we let Splunk monitor the file created by the Python script. Whenever the script updates new transactions to the file, Splunk will automatically ingest it. In Splunk we receive one event as one transaction in valid JSON and the time of the event is set to expenditure_date (or receipt_date if the former is null). Since the events are in JSON, Splunk automatically parses the data as well:

Click to enlarge

The raw event for the image above is:

{"candidate_office":"P","office_total_ytd":773609.49,"notary_sign_date":null,"dissemination_date":"2016-08-19","candidate":{"candidate_id":"P80001571","two_year_period":2016.0,"idx":71849},"pdf_url":"http:\/\/\/cgi-bin\/fecimg\/?201608199022591132","payee_middle_name":null,"candidate_prefix":null,"payee_last_name":null,"report_type":"48","load_date":"2016-08-19T23:26:19+00:00","record_number":null,"receipt_date":"2016-08-19","notary_sign_name":null,"independent_sign_name":"SCHIFELING, DEIRDRE","filing_form":"F24","expenditure_date":"2016-08-19","election_type":"G2016","line_number":"24","candidate_middle_name":null,"filer_middle_name":null,"category_code":"004","candidate_suffix":null,"payee_zip":"94104","tran_id":"B623462","payee_state":"CA","file_number":1095966,"committee":{"state_full":null,"committee_type_full":"Super PAC (Independent Expenditure-Only)","organization_type_full":null,"expire_date":"2016-03-19T00:37:18+00:00","zip":"10038","street_2":null,"cycles":[2016,2010,2014,2012],"treasurer_name":"GUSTAFSON, LIZ","city":"NEW YORK","cycle":2016,"party":null,"party_full":null,"designation_full":"Unauthorized","name":"PLANNED PARENTHOOD VOTES","committee_id":"C00489799","street_1":"123 WILLIAM ST, 10TH FLOOR","committee_type":"O","state":"NY","designation":"U","organization_type":null,"candidate_ids":[]},"election_type_full":null,"notary_commission_expiration_date":null,"payee_prefix":null,"is_notice":true,"image_number":"201608199022591132","sched_e_sk":312466902,"back_reference_schedule_name":null,"back_reference_transaction_id":null,"payee_first_name":null,"filer_suffix":null,"update_date":null,"filer_prefix":null,"committee_id":"C00489799","payee_city":"SAN FRANCISCO","link_id":4081920161313050011,"support_oppose_indicator":"O","filer_last_name":"SCHIFELING","payee_suffix":null,"candidate_id":"P80001571","payee_name":"TERRIS BARNES & WALTERS","cand_office_state":"US","candidate_first_name":"DONALD","category_code_full":null,"expenditure_amount":1732.67,"payee_street_2":null,"expenditure_description":"CANVASS LIT-ESTIMATED COSTS","candidate_last_name":"TRUMP","report_primary_general":null,"filer_first_name":"DEIRDRE","cand_office_district":null,"committee_name":null,"payee_street_1":"400 MONTGOMERY ST # 700","report_year":2016,"independent_sign_date":"2016-08-19","candidate_name":"TRUMP, DONALD"}

The next challenge is performance and stability on serving the Splunk data to the public. We don't want the site visitors to overload the Splunk server by running searches to retrieve the FEC results on every page visit. Also since the Schedule E data only updates every day or so, we needed a way to cache the results. Fortunately Nginx can easily cache files, therefore we wrote another Python script that uses the Splunk Python SDK to run various searches against the Splunk server and save the results as a file on the web server. The script runs every day. The following are Splunk searches that the script runs and its current result:

All searches are from May 1st, 2015 to now

index=fec sourcetype=fec_schedule_e candidate=clinton OR candidate=trump toward=supporting OR toward=opposing
| stats sum(expenditure_amount) as spent by committee_id committee.committee_type_full toward candidate candidate_id
| sort 0 -spent
| streamstats count as rank by toward candidate
| eval committee_id=if(rank<=5, committee_id, "none")
| eval<=5, '', "others ".toward." ".candidate)
| eval committee.committee_type_full=if(rank<=5, 'committee.committee_type_full', "none")
| stats sum(spent) as spent by committee_id committee.committee_type_full toward candidate candidate_id

English translation: Give me the total spending of the top 5 committee per candidate per action (supporting or opposting). Group the rest as "others supporting or opposing a candidate".

Result for the above: schedule_e_stats.json

index=fec sourcetype=fec_schedule_e candidate=clinton OR candidate=trump toward=supporting OR toward=opposing
| eval id=candidate."_".candidate_id."_".toward
| timechart span=1w sum(expenditure_amount) by id
| fillnull

English translation: Give me the total spending supporting and opposing each candidate per week.

Result for the above: schedule_e_timechart.json

index=fec sourcetype=fec_schedule_e candidate=clinton OR candidate=trump toward=supporting OR toward=opposing
| head 1
| table _time
| eval now=now()

English translation: Give me the last spending transaction date (doesn't matter which candidate) and the current time.

Result for the above: schedule_e_latest.json

The Python script also calls the Huffington Post API for the 2016 Presidential Election poll results which is saved as polls.json. The idea is to correlate poll results with Schedule E spending over time as shown in the first chart.

With the data available, we developed a custom visualization called "Halo" in d3.js v4 that represents two sets of pie charts with a visual relationship between the two sets. The Halo visualization uses the following d3 libraries to compute the pie charts layout:

In the d3 code, we heavily rely on underscore.js to further parse the JSON to a format that d3 likes. We use jQuery to do some minor DOM manipulation and Bootstrap for the front-end design. We modified a Bootstrap theme called Grayscale for this page. Lastly we use RequireJS to manage all the JavaScript libraries and to load the JSON files on the web server only once (as opposed to loading the same data for different visualizations).

Contact Splunk

Interested in learning more about Splunk and Splunk4Good? Visit us online or contact Splunk to learn more or to inquire about our products and solutions.